Linear algebra is one of the most useful parts of math for anyone in STEM. It’s applications range from circuits to regression to social network analyses to Google’s Page Rank Algorithm. If we were to make a network with edges connected topics in STEM that are related to one another linear algebra would very much be the central hub. This should come as no surprise since we like to linearize models to make them more tractable.
The goal of this workshop is to expose you to many core concepts in linear algebra. We’re going for more breadth than depth here and focusing on building intuition.
Simply put, a vector is a list of numbers. We say the dimension of the vector is the number of elements in it. Vectors come in two flavors: column and row vectors.
Much of vector arithmetic is pretty straightforward. Addition and subtraction happens how you’d expect, component-wise. When we multiply a vector by a scalar (a vector of dimension 1) we multiply each component by the scalar, which, well, rescales the vector.
We can even multiply vectors component-wise, but this is not the only notion of vector multiplication. We can do what is called a cross product, but we won’t go into that. There is also something called a dot product, also called an inner product. There is also an outer product.
The transpose of a vector turns a column vector to a row vector, and vice versa. Let \(x = (x_1, x_2, \ldots, x_n)\) and \(y = (y_1, y_2, \ldots, y_n)\). The transpose of \(x\) is written as \(x^T\), and we define the inner product of \(x\) and \(y\) to be
\[ x^T y = \sum_{i=1}^n x_i y_i \]
One way to define a matrix is that it is a thing with two indices. A data frame, for example, is a matrix because each entry is found by specifying a row number and a column number. The dimension of a matrix is given by two (ordered) numbers: the number of rows followed by the number of columns. Taking this viewpoint of a matrix, we can write the general form of a \(3\times 2\) matrix \(A\) as
\[ A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{pmatrix} . \]
Another convenient perspective on a matrix is that it is a list of column vectors. In this way, we can write the above matrix compactly as \(A = (a_1, a_2)\), where \[ a_1 = \begin{pmatrix} a_{11} \\ a_{21} \\ a_{31} \end{pmatrix}, \quad \text{and} \quad a_2= \begin{pmatrix} a_{12} \\ a_{22} \\ a_{32} \end{pmatrix}. \]
Much of what we covered in vector arithmetic carries over to matrices. Addition and subtraction is done component-wise. Again, we can do component-wise multiplication but there is a much more interesting way to multiply matrices. The way you usually see it defined is
\[ (AB)_{ij} = \sum_{k=1}^n a_{ik}b_{kj} \] Notice by this definition the number of columns in \(A\) must equal the number of rows in \(B\). So if \(\dim(A) = n\times p\) and \(\dim(B) = p\times m\) it follows that $(AB) = \(n\times m\).
There is another perspective we can take matrix multiplication. Notice that the above sum is just an inner product between the \(i\)th row of \(A\) and the \(j\)th column of \(B\). So really each element of a product of two matrices is computed with inner products.
Another way to look at matrix multiplication is by first looking at matrix-vector multiplication. So take the dimension of \(B\) to be \(n \times 1\). Then the above formula becomes
\[ (AB)_{ij} = \sum_{k=1}^n a_{ik}b_k = \sum_{k=1}^n b_k a_k \] where \(a_k\) is the \(k\)th column of \(A\).
Why is this approach important? Well, consider an additive model with \(p\) predictors. We can write out the model for the \(i\)th observation like so:
\[ y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \cdots + \beta_nx_{ip} + \varepsilon_i \] Now, if we wrote out the model for all \(n\) observations it would get a little out of hand:
\[\begin{align} y_1 &= \beta_0 + \beta_1x_{11} + \beta_2x_{12} + \cdots + \beta_nx_{1p} + \varepsilon_1\\ y_2 &= \beta_0 + \beta_1x_{21} + \beta_2x_{22} + \cdots + \beta_nx_{2p} + \varepsilon_2\\ &\hspace{2mm}\vdots\\ y_n &= \beta_0 + \beta_1x_{n1} + \beta_2x_{n2} + \cdots + \beta_nx_{np} + \varepsilon_n \end{align}\]
If we let \((X)_{ij}\) be the \(j\)th predictor for the \(i\)th observation, the above system of equations can be written in the much more compact form of
\[ y = X\beta + \epsilon \] where \(y = (y_1, y_2, \ldots, y_n)^T\), \(\beta = (\beta_1, \beta_2, \ldots, \beta_p)^T\), and \(\varepsilon = (\varepsilon_1, \varepsilon_2, \ldots, \varepsilon_n)^T\). The matrix \(X\) is called the design matrix.
Like real numbers, matrices also have a multiplicative inverse. That is, the inverse of \(A\), written \(A^{-1}\), is the matrix that when multiplied by \(A\) will return the identity matrix. Also like real numbers, not every matrix has an inverse. Unlike real numbers, matrix multiplication is in general not commutative: \(AB \neq BA\).
BTW: The outer product of a vector \(x\) with a vector \(y\) is defined to be \(x y^T\). Note that \(\dim(x) = n\times 1\) and \(\dim(y^T) = 1 \times n\) and hence \(\dim(x y^T) = n\times n\). Letting \(A = x y^T\) we have,
\[ (A)_{ij} = x_i y_j \]
Linear combination, linear independence, span, basis #### Linear Combinations
So we now know a matrix can be thought of as something that sends a vector to another vector. Consider the plot below where a matrix is acting on a lattice, where each point represents the head of a vector. Notice that most of the vectors have changed direction, but not all of them. There are a few that only change magnitude, not direction.
It turns out these are very special vectors associated to the matrix, called eigenvectors. Since they don’t change direction, you can convince yourself that they satisfy the eigenvalue equation:
\[ A x = \lambda x \] where \((\lambda, x)\) are called an eigenpair of \(A\).
Regression Lotka-Volterra PCA